Overview

Dataset statistics

Number of variables12
Number of observations871
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory395.6 KiB
Average record size in memory465.1 B

Variable types

CAT6
NUM6

Reproduction

Analysis started2020-07-24 17:24:17.111697
Analysis finished2020-07-24 17:24:24.655649
Duration7.54 seconds
Versionpandas-profiling v2.7.1
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
home_team has a high cardinality: 160 distinct values High cardinality
away_team has a high cardinality: 159 distinct values High cardinality
match has a high cardinality: 798 distinct values High cardinality
df_index is uniformly distributed Uniform
match is uniformly distributed Uniform
df_index has unique values Unique
home_team_score has 224 (25.7%) zeros Zeros
away_team_score has 353 (40.5%) zeros Zeros
total_scores has 88 (10.1%) zeros Zeros

Variables

df_index
Real number (ℝ≥0)

UNIFORM
UNIQUE
Distinct count871
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean479.7267508610792
Minimum0
Maximum956
Zeros1
Zeros (%)0.1%
Memory size6.9 KiB

Quantile statistics

Minimum0
5-th percentile50.5
Q1239.5
median483
Q3717.5
95-th percentile905.5
Maximum956
Range956
Interquartile range (IQR)478

Descriptive statistics

Standard deviation275.7583503
Coefficient of variation (CV)0.5748237926
Kurtosis-1.208298522
Mean479.7267509
Median Absolute Deviation (MAD)239
Skewness-0.01611045344
Sum417842
Variance76042.66778
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
956 1 0.1%
 
317 1 0.1%
 
328 1 0.1%
 
327 1 0.1%
 
326 1 0.1%
 
325 1 0.1%
 
324 1 0.1%
 
323 1 0.1%
 
322 1 0.1%
 
321 1 0.1%
 
Other values (861) 861 98.9%
 
ValueCountFrequency (%) 
0 1 0.1%
 
1 1 0.1%
 
2 1 0.1%
 
3 1 0.1%
 
4 1 0.1%
 
ValueCountFrequency (%) 
956 1 0.1%
 
955 1 0.1%
 
954 1 0.1%
 
953 1 0.1%
 
952 1 0.1%
 

home_team
Categorical

HIGH CARDINALITY
Distinct count160
Unique (%)18.4%
Missing0
Missing (%)0.0%
Memory size6.9 KiB
Mexico
 
23
Portugal
 
15
Estonia
 
15
Saudi Arabia
 
15
Greece
 
14
Other values (155)
789
ValueCountFrequency (%) 
Mexico 23 2.6%
 
Portugal 15 1.7%
 
Estonia 15 1.7%
 
Saudi Arabia 15 1.7%
 
Greece 14 1.6%
 
Oman 14 1.6%
 
Hungary 14 1.6%
 
Denmark 13 1.5%
 
England 13 1.5%
 
Austria 12 1.4%
 
Other values (150) 723 83.0%
 

Length

Max length22
Mean length7.947187141
Min length4
ValueCountFrequency (%) 
Lowercase_Letter 26 50.0%
 
Uppercase_Letter 25 48.1%
 
Space_Separator 1 1.9%
 
ValueCountFrequency (%) 
Latin 51 98.1%
 
Common 1 1.9%
 
ValueCountFrequency (%) 
ASCII 52 100.0%
 

home_team_score
Real number (ℝ≥0)

ZEROS
Distinct count14
Unique (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.632606199770379
Minimum0
Maximum31
Zeros224
Zeros (%)25.7%
Memory size6.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile5
Maximum31
Range31
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.905959795
Coefficient of variation (CV)1.167433883
Kurtosis68.55786471
Mean1.6326062
Median Absolute Deviation (MAD)1
Skewness5.524378027
Sum1422
Variance3.63268274
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1 264 30.3%
 
0 224 25.7%
 
2 205 23.5%
 
3 97 11.1%
 
4 35 4.0%
 
5 23 2.6%
 
6 8 0.9%
 
8 4 0.5%
 
7 4 0.5%
 
9 3 0.3%
 
Other values (4) 4 0.5%
 
ValueCountFrequency (%) 
0 224 25.7%
 
1 264 30.3%
 
2 205 23.5%
 
3 97 11.1%
 
4 35 4.0%
 
ValueCountFrequency (%) 
31 1 0.1%
 
15 1 0.1%
 
11 1 0.1%
 
10 1 0.1%
 
9 3 0.3%
 

away_team
Categorical

HIGH CARDINALITY
Distinct count159
Unique (%)18.3%
Missing0
Missing (%)0.0%
Memory size6.9 KiB
Argentina
 
15
Uruguay
 
15
Kuwait
 
14
Bulgaria
 
13
Israel
 
13
Other values (154)
801
ValueCountFrequency (%) 
Argentina 15 1.7%
 
Uruguay 15 1.7%
 
Kuwait 14 1.6%
 
Bulgaria 13 1.5%
 
Israel 13 1.5%
 
Sweden 12 1.4%
 
Spain 12 1.4%
 
Paraguay 12 1.4%
 
Norway 12 1.4%
 
Singapore 11 1.3%
 
Other values (149) 742 85.2%
 

Length

Max length24
Mean length7.835820896
Min length4
ValueCountFrequency (%) 
Lowercase_Letter 26 50.0%
 
Uppercase_Letter 25 48.1%
 
Space_Separator 1 1.9%
 
ValueCountFrequency (%) 
Latin 51 98.1%
 
Common 1 1.9%
 
ValueCountFrequency (%) 
ASCII 52 100.0%
 

away_team_score
Real number (ℝ≥0)

ZEROS
Distinct count9
Unique (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.0045924225028702
Minimum0
Maximum9
Zeros353
Zeros (%)40.5%
Memory size6.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile3
Maximum9
Range9
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.154691395
Coefficient of variation (CV)1.149412806
Kurtosis5.240328362
Mean1.004592423
Median Absolute Deviation (MAD)1
Skewness1.746568561
Sum875
Variance1.333312219
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 353 40.5%
 
1 292 33.5%
 
2 148 17.0%
 
3 47 5.4%
 
4 20 2.3%
 
5 5 0.6%
 
6 3 0.3%
 
7 2 0.2%
 
9 1 0.1%
 
ValueCountFrequency (%) 
0 353 40.5%
 
1 292 33.5%
 
2 148 17.0%
 
3 47 5.4%
 
4 20 2.3%
 
ValueCountFrequency (%) 
9 1 0.1%
 
7 2 0.2%
 
6 3 0.3%
 
5 5 0.6%
 
4 20 2.3%
 

year
Categorical

Distinct count26
Unique (%)3.0%
Missing0
Missing (%)0.0%
Memory size6.9 KiB
2010
141
2002
 
78
2004
 
62
2007
 
57
2000
 
57
Other values (21)
476
ValueCountFrequency (%) 
2010 141 16.2%
 
2002 78 9.0%
 
2004 62 7.1%
 
2007 57 6.5%
 
2000 57 6.5%
 
2009 54 6.2%
 
1998 53 6.1%
 
1999 44 5.1%
 
2005 42 4.8%
 
2006 41 4.7%
 
Other values (16) 242 27.8%
 

Length

Max length4
Mean length4
Min length4
ValueCountFrequency (%) 
Decimal_Number 10 100.0%
 
ValueCountFrequency (%) 
Common 10 100.0%
 
ValueCountFrequency (%) 
ASCII 10 100.0%
 

home_team_rank
Real number (ℝ≥0)

Distinct count181
Unique (%)20.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean65.54994259471871
Minimum1.0
Maximum206.0
Zeros0
Zeros (%)0.0%
Memory size6.9 KiB

Quantile statistics

Minimum1
5-th percentile6
Q127
median58
Q395
95-th percentile160
Maximum206
Range205
Interquartile range (IQR)68

Descriptive statistics

Standard deviation47.55799336
Coefficient of variation (CV)0.725523036
Kurtosis-0.1488295399
Mean65.54994259
Median Absolute Deviation (MAD)34
Skewness0.7530868154
Sum57094
Variance2261.762733
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
33 14 1.6%
 
27 13 1.5%
 
8 12 1.4%
 
11 12 1.4%
 
15 12 1.4%
 
12 12 1.4%
 
67 11 1.3%
 
23 10 1.1%
 
6 10 1.1%
 
45 10 1.1%
 
Other values (171) 755 86.7%
 
ValueCountFrequency (%) 
1 8 0.9%
 
2 7 0.8%
 
3 7 0.8%
 
4 6 0.7%
 
5 9 1.0%
 
ValueCountFrequency (%) 
206 1 0.1%
 
203 1 0.1%
 
201 2 0.2%
 
200 1 0.1%
 
199 1 0.1%
 

away_team_rank
Real number (ℝ≥0)

Distinct count188
Unique (%)21.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean69.91848450057405
Minimum1.0
Maximum209.0
Zeros0
Zeros (%)0.0%
Memory size6.9 KiB

Quantile statistics

Minimum1
5-th percentile5
Q128
median61
Q3102.5
95-th percentile169.5
Maximum209
Range208
Interquartile range (IQR)74.5

Descriptive statistics

Standard deviation50.4976523
Coefficient of variation (CV)0.7222360819
Kurtosis-0.361778495
Mean69.9184845
Median Absolute Deviation (MAD)36
Skewness0.6669445275
Sum60899
Variance2550.012888
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
20 12 1.4%
 
30 12 1.4%
 
36 12 1.4%
 
23 11 1.3%
 
2 11 1.3%
 
14 11 1.3%
 
4 11 1.3%
 
37 11 1.3%
 
3 11 1.3%
 
1 10 1.1%
 
Other values (178) 759 87.1%
 
ValueCountFrequency (%) 
1 10 1.1%
 
2 11 1.3%
 
3 11 1.3%
 
4 11 1.3%
 
5 6 0.7%
 
ValueCountFrequency (%) 
209 1 0.1%
 
204 1 0.1%
 
203 3 0.3%
 
202 1 0.1%
 
201 2 0.2%
 

tournament
Categorical

Distinct count38
Unique (%)4.4%
Missing0
Missing (%)0.0%
Memory size6.9 KiB
Friendly
574
FIFA World Cup qualification
134
UEFA Euro qualification
 
37
AFC Asian Cup qualification
 
31
Cyprus International Tournament
 
8
Other values (33)
 
87
ValueCountFrequency (%) 
Friendly 574 65.9%
 
FIFA World Cup qualification 134 15.4%
 
UEFA Euro qualification 37 4.2%
 
AFC Asian Cup qualification 31 3.6%
 
Cyprus International Tournament 8 0.9%
 
Gulf Cup 7 0.8%
 
AFF Championship 7 0.8%
 
CECAFA Cup 6 0.7%
 
Copa América 6 0.7%
 
African Cup of Nations qualification 5 0.6%
 
Other values (28) 56 6.4%
 

Length

Max length42
Mean length13.52009185
Min length7
ValueCountFrequency (%) 
Lowercase_Letter 24 58.5%
 
Uppercase_Letter 15 36.6%
 
Other_Punctuation 1 2.4%
 
Space_Separator 1 2.4%
 
ValueCountFrequency (%) 
Latin 39 95.1%
 
Common 2 4.9%
 
ValueCountFrequency (%) 
ASCII 39 100.0%
 

game_results
Categorical

Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.9 KiB
Win
420
Draw
226
Lose
225
ValueCountFrequency (%) 
Win 420 48.2%
 
Draw 226 25.9%
 
Lose 225 25.8%
 

Length

Max length4
Mean length3.517795637
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 8 72.7%
 
Uppercase_Letter 3 27.3%
 
ValueCountFrequency (%) 
Latin 11 100.0%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

total_scores
Real number (ℝ≥0)

ZEROS
Distinct count14
Unique (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.637198622273249
Minimum0
Maximum31
Zeros88
Zeros (%)10.1%
Memory size6.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q34
95-th percentile6
Maximum31
Range31
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.093601298
Coefficient of variation (CV)0.7938731958
Kurtosis40.0930993
Mean2.637198622
Median Absolute Deviation (MAD)1
Skewness3.743740762
Sum2297
Variance4.383166396
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2 233 26.8%
 
3 173 19.9%
 
1 153 17.6%
 
4 115 13.2%
 
0 88 10.1%
 
5 51 5.9%
 
6 25 2.9%
 
7 15 1.7%
 
8 7 0.8%
 
10 4 0.5%
 
Other values (4) 7 0.8%
 
ValueCountFrequency (%) 
0 88 10.1%
 
1 153 17.6%
 
2 233 26.8%
 
3 173 19.9%
 
4 115 13.2%
 
ValueCountFrequency (%) 
31 1 0.1%
 
15 1 0.1%
 
11 1 0.1%
 
10 4 0.5%
 
9 4 0.5%
 

match
Categorical

HIGH CARDINALITY
UNIFORM
Distinct count798
Unique (%)91.6%
Missing0
Missing (%)0.0%
Memory size6.9 KiB
Honduras Vs Colombia
 
3
Lithuania Vs Belarus
 
3
Ecuador Vs Venezuela
 
3
Mexico Vs Paraguay
 
3
Peru Vs Chile
 
3
Other values (793)
856
ValueCountFrequency (%) 
Honduras Vs Colombia 3 0.3%
 
Lithuania Vs Belarus 3 0.3%
 
Ecuador Vs Venezuela 3 0.3%
 
Mexico Vs Paraguay 3 0.3%
 
Peru Vs Chile 3 0.3%
 
Bolivia Vs Uruguay 3 0.3%
 
Costa Rica Vs Jamaica 2 0.2%
 
Albania Vs Greece 2 0.2%
 
El Salvador Vs Honduras 2 0.2%
 
Hungary Vs Switzerland 2 0.2%
 
Other values (788) 845 97.0%
 

Length

Max length42
Mean length19.78300804
Min length12
ValueCountFrequency (%) 
Lowercase_Letter 26 50.0%
 
Uppercase_Letter 25 48.1%
 
Space_Separator 1 1.9%
 
ValueCountFrequency (%) 
Latin 51 98.1%
 
Common 1 1.9%
 
ValueCountFrequency (%) 
ASCII 52 100.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

df_indexhome_teamhome_team_scoreaway_teamaway_team_scoreyearhome_team_rankaway_team_ranktournamentgame_resultstotal_scoresmatch
00Bolivia3Uruguay1199359.022.0FIFA World Cup qualificationWin4Bolivia Vs Uruguay
11Brazil1Mexico119938.014.0FriendlyDraw2Brazil Vs Mexico
22Ecuador5Venezuela0199335.094.0FIFA World Cup qualificationWin5Ecuador Vs Venezuela
33Guinea1Sierra Leone0199365.086.0FriendlyWin1Guinea Vs Sierra Leone
44Paraguay1Argentina3199367.05.0FIFA World Cup qualificationLose4Paraguay Vs Argentina
55Peru0Colombia1199370.019.0FIFA World Cup qualificationLose1Peru Vs Colombia
67Saudi Arabia1Costa Rica2199344.038.0FriendlyLose3Saudi Arabia Vs Costa Rica
79Israel1Ukraine0199454.090.0FriendlyWin1Israel Vs Ukraine
811England5Greece0199415.032.0FriendlyWin5England Vs Greece
912Poland3Austria4199427.041.0FriendlyLose7Poland Vs Austria

Last rows

df_indexhome_teamhome_team_scoreaway_teamaway_team_scoreyearhome_team_rankaway_team_ranktournamentgame_resultstotal_scoresmatch
861946Mexico3Republic of Ireland1201717.026.0FriendlyWin4Mexico Vs Republic of Ireland
862947Nigeria3Togo0201738.0112.0FriendlyWin3Nigeria Vs Togo
863948Switzerland1Belarus020179.083.0FriendlyWin1Switzerland Vs Belarus
864950Grenada0Barbados22017163.0181.0Windward Islands TournamentLose2Grenada Vs Barbados
865951Uganda0Namibia1201873.0111.0African Nations ChampionshipLose1Uganda Vs Namibia
866952England2Costa Rica0201812.023.0FriendlyWin2England Vs Costa Rica
867953Uruguay3Uzbekistan0201814.095.0FriendlyWin3Uruguay Vs Uzbekistan
868954Portugal3Algeria020184.066.0FriendlyWin3Portugal Vs Algeria
869955Iceland2Ghana2201822.047.0FriendlyDraw4Iceland Vs Ghana
870956India1New Zealand2201897.0120.0Intercontinental CupLose3India Vs New Zealand